Exercises
- Use NGPhylogeny.fr to analyze the set of rRNA sequence provided. Describe the methods and put the .png file from your analysis into your Lab 8 .Rmd file
Steps
- Go to https://ngphylogeny.fr/
- Select “One Click”
- Upload/paste genetic sequences from Moodle file
- Click submit
- Save Tree as png file
- Save Tree as newick file for later use

- Align and do phylogenetic analysis off the sequences in CIPRES using MAFFT and FastTreeMP. You will need to click on Parameter Set and Save even if you don’t change the parameters. Download the fastree_result.tre to your computer. Put the resulting tree file in your .Rmd file
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggtree)
## Warning: package 'ggtree' was built under R version 3.6.3
## Registered S3 method overwritten by 'treeio':
## method from
## root.phylo ape
## ggtree v2.0.4 For help: https://yulab-smu.github.io/treedata-book/
##
## If you use ggtree in published research, please cite the most appropriate paper(s):
##
## [36m-[39m Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, 35(12):3041-3043. doi: 10.1093/molbev/msy194
## [36m-[39m Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36, doi:10.1111/2041-210X.12628
##
## Attaching package: 'ggtree'
## The following object is masked from 'package:tidyr':
##
## expand
tree <- read.tree("./data/lab-8-data/TOL_fastree_result.tre")
tree
##
## Phylogenetic tree with 21 tips and 19 internal nodes.
##
## Tip labels:
## Archaeoglobus_fulgidus, Trypanosoma_cruzi_nuclear, Amphidinium_carterae, Saccharomyces_cerevisiae_nuclear, Homo_sapies_nuclear, Drosophila_yakuba_nuclear, ...
## Node labels:
## , 0.999, 0.893, 1.000, 0.988, 0.982, ...
##
## Unrooted; includes branch lengths.
ggtree(tree) +
theme_tree2() +
geom_tiplab() +
xlim(0,2)
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `mutate_()` is deprecated as of dplyr 0.7.0.
## Please use `mutate()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

- Go through the tutorial on Visualizing and Annotating Phylogenetic Trees with R+ggtree adding the steps to your .Rmd file.
Upload tidyverse and ggtree packages and download tree data
library(tidyverse)
library(ggtree)
tree <- read.tree("./data/lab-8-data/tree_newick.nwk")
tree
##
## Phylogenetic tree with 13 tips and 12 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
Build a ggtree
# build a ggplot with a geom_tree
ggplot(tree) +
geom_tree() +
theme_tree()

# This is convenient shorthand
ggtree(tree)

Add a scale to the ggtree
# add a scale
ggtree(tree) +
geom_treescale()

# or add the entire scale to the x axis with theme_tree2()
ggtree(tree) +
theme_tree2()

Create a cladogram instead of a phylogram
ggtree(tree, branch.length="none")

Change aesthetics of the ggtree
ggtree(tree, branch.length="none", color="blue", size=2, linetype=3)

Exercise 1
- Create a slanted phylogenetic tree
- Create a circular phylogenetic tree
- Create a circular unscaled cladogram with thick red lines
ggtree(tree, layout="slanted")

ggtree(tree, layout="circular")

ggtree(tree, layout="circular", color="red", size=2)

Add additional layers to the ggtree
# create the basic plot
p <- ggtree(tree)
# add node points
p + geom_nodepoint()

# add tip points
p + geom_tippoint()

# Label the tips
p + geom_tiplab()

Exercise 2 - Create a phylogeny with the following aesthetics
- tips labeled in purple
- purple-colored diamond-shape tip points (hint: Google search “R point characters”)
- large semitransparent yellow node points (hint: alpha=)
- Add a title with + ggtitle(…)
ggtree(tree) +
geom_tippoint(fill="purple", shape=23) +
geom_nodepoint(color="yellow", alpha=0.8, size=4) +
ggtitle("Aesthetically-pleasing phylogenetic tree")

Add node labels to ggtree
ggtree(tree) +
geom_text(aes(label=node), hjust=-.3)

Re-create plot to decide which taxa to grab for MCA
ggtree(tree) +
geom_tiplab()

Get internal node number of C and E
MRCA(tree, "C", "E")
## [1] 17
Get internal node number of G and H
MRCA(tree, "G", "H")
## [1] 21
Label clade 17
ggtree(tree) +
geom_cladelabel(node=17, label="Some random clade", color="red")

Adjust positioning of clade label
ggtree(tree) +
geom_tiplab() +
geom_cladelabel(node=17, label="Some random clade",
color="red2", offset=.8)

Add another clade label
ggtree(tree) +
geom_tiplab() +
geom_cladelabel(node=17, label="Some random clade",
color="red2", offset=.8) +
geom_cladelabel(node=21, label="A different clade",
color="blue", offset=.8)

Align clade labels and adjust scale of x-axis to see labels
ggtree(tree) +
geom_tiplab() +
geom_cladelabel(node=17, label="Some random clade",
color="red2", offset=.8, align=TRUE) +
geom_cladelabel(node=21, label="A different clade",
color="blue", offset=.8, align=TRUE) +
theme_tree2() +
xlim(0, 70) +
theme_tree()

Highlight clades instead of the label
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=17, fill="gold") +
geom_hilight(node=21, fill="purple")

Show evolutionary events by connecting taxa
ggtree(tree) +
geom_tiplab() +
geom_taxalink("E", "H", color="blue3") +
geom_taxalink("C", "G", color="orange2", curvature=-.9)

Exercise 3 - Produce the figure below
- First, find what the MRCA is for taxa B+C, and taxa L+J. You can do this in one of two ways:
- Easiest: use MRCA(tree, tip=c(“taxon1”, “taxon2”)) for B/C and L/J separately.
- Alternatively: use ggtree(tree) + geom_text(aes(label=node), hjust=-.3) to see what the node labels are on the plot. You might also add tip labels here too.
MRCA(tree, "B", "C")
## [1] 19
MRCA(tree, "L", "J")
## [1] 23
- Draw the tree with ggtree(tree).
ggtree(tree)

- Add tip labels.
ggtree(tree) +
geom_tiplab()

- Highlight these clades with separate colors.
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=19, fill="purple") +
geom_hilight(node=23, fill="gold")

- Add a clade label to the larger superclade (node=17) that we saw before that includes A, B, C, D, and E. You’ll probably need an offset to get this looking right.
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=19, fill="purple") +
geom_hilight(node=23, fill="gold") +
geom_cladelabel(node=17, label="Clade 17",
color="red2", offset=.8, align=TRUE) +
theme_tree2() +
xlim(0, 70) +
theme_tree()

- Link taxa C to E, and G to J with a dashed gray line (hint: get the geom working first, then try changing the aesthetics. You’ll need linetype=2 somewhere in the geom_taxalink()).
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=19, fill="purple") +
geom_hilight(node=23, fill="gold") +
geom_cladelabel(node=17, label="Clade 17",
color="red2", offset=.8, align=TRUE) +
theme_tree2() +
xlim(0, 70) +
theme_tree() +
geom_taxalink("C", "E", color="gray", linetype=2) +
geom_taxalink("G", "J", color="gray", linetype=2)

- Add a scale bar to the bottom by changing the theme.
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=19, fill="purple") +
geom_hilight(node=23, fill="gold") +
geom_cladelabel(node=17, label="Clade 17",
color="red2", offset=.8, align=TRUE) +
theme_tree2() +
xlim(0, 70) +
theme_tree() +
geom_taxalink("C", "E", color="gray", linetype=2) +
geom_taxalink("G", "J", color="gray", linetype=2) +
theme_tree2()

- Add a title.
ggtree(tree) +
geom_tiplab() +
geom_hilight(node=19, fill="purple") +
geom_hilight(node=23, fill="gold") +
geom_cladelabel(node=17, label="Clade 17",
color="red2", offset=.8, align=TRUE) +
theme_tree2() +
xlim(0, 70) +
theme_tree() +
geom_taxalink("C", "E", color="gray", linetype=2) +
geom_taxalink("G", "J", color="gray", linetype=2) +
theme_tree2() +
ggtitle("Exercise 3")

- Optionally, go back to the original ggtree(tree, …) call and change the layout to “circular”.
ggtree(tree, layout="circular") +
geom_tiplab() +
geom_hilight(node=19, fill="purple") +
geom_hilight(node=23, fill="gold") +
geom_cladelabel(node=17, label="Clade 17",
color="red2", offset=.8, align=TRUE) +
theme_tree2() +
xlim(0, 70) +
theme_tree() +
theme_tree2() +
ggtitle("Exercise 3")

Upload influenza data file from beast and create ggtree of data
library(treeio)
## treeio v1.10.0 For help: https://yulab-smu.github.io/treedata-book/
##
## If you use treeio in published research, please cite:
##
## LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution 2019, accepted. doi: 10.1093/molbev/msz240
# Read the data
tree <- read.beast("./data/lab-8-data/flu_tree_beast.tree")
# supply a most recent sampling date so you get the dates
# and add a scale bar
ggtree(tree, mrsd="2013-01-01") +
theme_tree2()

# Finally, add tip labels and adjust axis
ggtree(tree, mrsd="2013-01-01") +
theme_tree2() +
geom_tiplab(align=TRUE, linesize=.5) +
xlim(1990, 2020)

Align the fasta sequences alongside of the ggtree
msaplot(p=ggtree(tree), fasta="./data/lab-8-data/flu_aasequence.fasta", window=c(150, 175))

Change the above plot to a circular plot
msaplot(p=ggtree(tree), fasta="./data/lab-8-data/flu_aasequence.fasta", window=c(150, 175)) +
coord_polar(theta="y")

Generate 3 replicates of 4 random trees and then use facet_wrap to facet ggtrees
set.seed(42)
trees <- lapply(rep(c(10, 25, 50, 100), 3), rtree)
class(trees) <- "multiPhylo"
ggtree(trees) + facet_wrap(~.id, scale="free", ncol=4) + ggtitle("Many trees. Such phylogenetics. Wow.")

Plot three distinct plots using the function facet_plot()
# Generate a random tree with 30 tips
tree <- rtree(30)
# Make the original plot
p <- ggtree(tree)
# generate some random values for each tip label in the data
d1 <- data.frame(id=tree$tip.label, val=rnorm(30, sd=3))
# Make a second plot with the original, naming the new plot "dot",
# using the data you just created, with a point geom.
p2 <- facet_plot(p, panel="dot", data=d1, geom=geom_point, aes(x=val), color='red3')
# Make some more data with another random value.
d2 <- data.frame(id=tree$tip.label, value = abs(rnorm(30, mean=100, sd=50)))
# Now add to that second plot, this time using the new d2 data above,
# This time showing a bar segment, size 3, colored blue.
p3 <- facet_plot(p2, panel='bar', data=d2, geom=geom_segment,
aes(x=0, xend=value, y=y, yend=y), size=3, color='blue4')
# Show all three plots with a scale
p3 + theme_tree2()

Create a ggtree with silhouette images
newick <- "((Pongo_abelii,(Gorilla_gorilla_gorilla,(Pan_paniscus,Pan_troglodytes)Pan,Homo_sapiens)Homininae)Hominidae,Nomascus_leucogenys)Hominoidea;"
tree <- read.tree(text=newick)
d <- ggimage::phylopic_uid(tree$tip.label)
d$body_mass = c(52, 114, 47, 45, 58, 6)
p <- ggtree(tree) %<+% d +
geom_tiplab(aes(image=uid, colour=body_mass), geom="phylopic", offset=2.5) +
geom_tiplab(aes(label=label), offset = .2) + xlim(NA, 7) +
scale_color_viridis_c()
## Loading required package: ggimage
p

- Upload your tree file from the FastTreeMP output on CIPRES. Color the tree according to the domains of life. Upload a circular version of the tree to your notebook.
library(tidyverse)
library(ggtree)
tree <- read.tree("./data/lab-8-data/TOL_fastree_result.tre")
tree
##
## Phylogenetic tree with 21 tips and 19 internal nodes.
##
## Tip labels:
## Archaeoglobus_fulgidus, Trypanosoma_cruzi_nuclear, Amphidinium_carterae, Saccharomyces_cerevisiae_nuclear, Homo_sapies_nuclear, Drosophila_yakuba_nuclear, ...
## Node labels:
## , 0.999, 0.893, 1.000, 0.988, 0.982, ...
##
## Unrooted; includes branch lengths.
ggtree(tree) +
theme_tree2() +
geom_tiplab() +
xlim(0,2) +
geom_nodelab()

MRCA(tree, "Oryza_mitochondrion", "Thermotoga_lettingae_")
## [1] 30
MRCA(tree, "Drosophila_yakuba_nuclear", "Trypanosoma_cruzi_nuclear")
## [1] 25
MRCA(tree, "Candidatus_Korarchaeum_cryptofilum_", "Archaeoglobus_fulgidus")
## [1] 22
ggtree(tree, layout="circular") +
theme_tree2() +
geom_tiplab(hjust=-.1, size = 3) +
xlim(0, 2) +
geom_nodelab() +
geom_hilight(node=22, fill="green") +
geom_hilight(node=30, fill="gold", extend=0.23) +
geom_hilight(node=25, fill="purple")
